场景文本擦除旨在从场景图像中删除文本内容,而当前的最新文本擦除模型经过大规模合成数据的培训。尽管数据合成引擎可以提供大量注释的训练样本,但合成数据和现实世界数据之间存在差异。在本文中,我们在未标记的现实世界场景文本图像上采用自我审视来进行特征表示。一项新颖的借口任务旨在在图像变体的文本蒙版之间保持一致。我们设计了渐进式擦除网络,以删除剩余文本。场景文本通过利用中间生成的结果逐渐消除,这为随后的更高质量结果奠定了基础。实验表明,我们的方法显着改善了文本擦除任务的概括,并在公共基准上实现了最先进的性能。
translated by 谷歌翻译
文档布局分析(DLA)在信息提取和文档理解中起重要作用。目前,文件布局分析已达到里程碑成果,但是非曼哈顿的文件布局分析仍然是一项挑战。在本文中,我们提出了一种图像层建模方法来解决这一挑战。为了测量所提出的图像层建模方法,我们提出了一个名为FPD的手动标记的非曼哈顿布局细粒细分分段数据集。据我们所知,FPD是第一个手动标记的非曼哈顿布局细粒细分分段数据集。为了有效提取文档的细粒度特征,我们提出了一个名为L-E ^ 3Net的边缘嵌入网络。实验结果证明,我们提出的图像层建模方法可以更好地处理非曼哈顿布局的细粒度分段文件。
translated by 谷歌翻译
Vision Transformer已成为计算机视觉中的新范式,表现出出色的性能,同时还具有昂贵的计算成本。图像令牌修剪是VIT压缩的主要方法之一,这是因为相对于令牌数的复杂性是二次的,而许多仅包含背景区域的令牌并不能真正促进最终预测。现有作品要么依赖其他模块来评分单个令牌的重要性,要么为不同的输入实例实施固定比率修剪策略。在这项工作中,我们提出了一个自适应的稀疏令牌修剪框架,成本最低。我们的方法是基于可学习的阈值,并利用多头自我注意力来评估令牌信息,但几乎没有其他操作。具体而言,我们首先提出了廉价的注意力重点加权阶级注意力评分机制。然后,将可学习的参数插入VIT作为阈值,以区分信息令牌和不重要的令牌。通过比较令牌注意分数和阈值,我们可以从层次上丢弃无用的令牌,从而加速推理。可学习的阈值在预算感知培训中进行了优化,以平衡准确性和复杂性,并为不同的输入实例执行相应的修剪配置。广泛的实验证明了我们方法的有效性。例如,我们的方法将DEIT-S的吞吐量提高了50%,并且TOP-1的准确性仅下降了0.2%,这比以前的方法在准确性和延迟之间取得了更好的权衡。
translated by 谷歌翻译
组合多个传感器使机器人能够最大程度地提高其对环境的感知意识,并增强其对外部干扰的鲁棒性,对机器人导航至关重要。本文提出了可融合的基准测试,这是一个完整的多传感器数据集,具有多种移动机器人序列。本文提出了三项贡献。我们首先推进便携式和通用的多传感器套件,可提供丰富的感官测量值:10Hz激光镜点云,20Hz立体声框架图像,来自立体声事件相机的高速率和异步事件,来自IMU的200Hz惯性读数以及10Hz GPS信号。传感器已经在硬件中暂时同步。该设备轻巧,独立,并为移动机器人提供插件支持。其次,我们通过收集17个序列来构建数据集,该序列通过利用多个机器人平台进行数据收集来涵盖校园上各种环境。一些序列对现有的SLAM算法具有挑战性。第三,我们为将本地化和映射绩效评估提供了基础真理。我们还评估最新的大满贯方法并确定其局限性。该数据集将发布由原始传感器的设置,地面真相,校准数据和评估算法组成:https://ram-lab.com/file/site/site/multi-sensor-dataset。
translated by 谷歌翻译
我们为深神经网络引入了两个低位训练后训练量化(PTQ)方法,该方法满足硬件要求,并且不需要长期重新训练。两次量化的能力可以将通过量化和去除化引入的乘法转换为许多有效加速器采用的位移位。但是,两次量表因子的候选值较少,这会导致更多的舍入或剪辑错误。我们提出了一种新型的两个PTQ框架,称为RAPQ,该框架被动态调整了整个网络的两个尺度,而不是静态地确定它们一层。从理论上讲,它可以权衡整个网络的舍入错误和剪辑错误。同时,RAPQ中的重建方法基于每个单元的BN信息。对Imagenet的广泛实验证明了我们提出的方法的出色性能。没有铃铛和哨声,REPQ在RESNET-18和MOBILENETV2上的准确度可以达到65%和48%,分别具有INT2激活INT4的精度。我们是第一个为低位PTQ提出更受限制但对硬件友好型的两次量化方案的人,并证明它可以达到与SOTA PTQ方法几乎相同的准确性。该代码已发布。
translated by 谷歌翻译
高清(HD)地图可以为自动驾驶提供静态交通环境的精确几何和语义信息。道路边界是高清地图中包含的最重要的信息之一,因为它区分道路地区和越野地区,可以引导车辆在道路区域内驾驶。但它是劳动密集型的,以向城市规模提供高清地图的道路边界。为了启用自动高清映射注释,当前工作使用语义分割或迭代图,用于道路边界检测。然而,前者无法确保拓扑正确性,因为它在像素级别工作,而后者遭受效率低下和漂流问题。为了提供上述问题的解决方案,在这封信中,我们提出了一个新的系统被称为CSBoundary,以便在城市规模上自动检测高清地图注释的道路边界。我们的网络将作为输入空中图像补丁的输入,并直接从此图像中递送连续的道路边界图(即顶点和边缘)。要生成城市规模的道路边界图,我们将从所有图像修补程序缝制所获得的图形。我们的CSBoundary在公共基准数据集中进行了评估并进行了比较。结果表明了我们的优越感。伴随的演示视频可在我们的项目页面\ url {https:/sites.google.com/view/csbound/}处获得。
translated by 谷歌翻译
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
translated by 谷歌翻译
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have become increasingly important in recent years due to their state-of-the-art performance on many important downstream applications. Existing GNNs have mostly focused on learning a single node representation, despite that a node often exhibits polysemous behavior in different contexts. In this work, we develop a persona-based graph neural network framework called PersonaSAGE that learns multiple persona-based embeddings for each node in the graph. Such disentangled representations are more interpretable and useful than a single embedding. Furthermore, PersonaSAGE learns the appropriate set of persona embeddings for each node in the graph, and every node can have a different number of assigned persona embeddings. The framework is flexible enough and the general design helps in the wide applicability of the learned embeddings to suit the domain. We utilize publicly available benchmark datasets to evaluate our approach and against a variety of baselines. The experiments demonstrate the effectiveness of PersonaSAGE for a variety of important tasks including link prediction where we achieve an average gain of 15% while remaining competitive for node classification. Finally, we also demonstrate the utility of PersonaSAGE with a case study for personalized recommendation of different entity types in a data management platform.
translated by 谷歌翻译
With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.
translated by 谷歌翻译